A simulation study of cross-validation for selecting an optimal cutpoint in univariate survival analysis.
نویسندگان
چکیده
Continuous measurements are often dichotomized for classification of subjects. This paper evaluates two procedures for determining a best cutpoint for a continuous prognostic factor with right censored outcome data. One procedure selects the cutpoint that minimizes the significance level of a logrank test with comparison of the two groups defined by the cutpoint. This procedure adjusts the significance level for maximal selection. The other procedure uses a cross-validation approach. The latter easily extends to accommodate multiple other prognostic factors. We compare the methods in terms of statistical power and bias in estimation of the true relative risk associated with the prognostic factor. Both procedures produce approximately the correct type I error rate. Use of a maximally selected cutpoint without adjustment of the significance level, however, results in a substantially elevated type I error rate. The cross-validation procedure unbiasedly estimated the relative risk under the null hypothesis while the procedure based on the maximally selected test resulted in an upward bias. When the relative risk for the two groups defined by the covariate and true changepoint was small, the cross-validation procedure provided greater power than the maximally selected test. The cross-validation based estimate of relative risk was unbiased while the procedure based on the maximally selected test produced a biased estimate. As the true relative risk increased, the power of the maximally selected test was about 10 per cent greater than the power obtained using cross-validation. The maximally selected test overestimated the relative risk by about 10 per cent. The cross-validation procedure produced at most 5 per cent underestimation of the true relative risk. Finally, we report the effect of dichotomizing a continuous non-linear relationship between covariate and risk. We compare using a linear proportional hazard model to using models based on optimally selected cutpoints. Our simulation study indicates that we can have a substantial loss of statistical power when we use cutpoint models in cases where there is a continuous relationship between covariate and risk.
منابع مشابه
Determining optimal value of the shape parameter $c$ in RBF for unequal distances topographical points by Cross-Validation algorithm
Several radial basis function based methods contain a free shape parameter which has a crucial role in the accuracy of the methods. Performance evaluation of this parameter in different functions with various data has always been a topic of study. In the present paper, we consider studying the methods which determine an optimal value for the shape parameter in interpolations of radial basis ...
متن کاملCross Efficiency Evaluation with Negative Data in Selecting the Best of Portfolio Using OWA Operator Weights
The present study is an attempt toward evaluating the performance of portfolios and asset selectionusing cross-efficiency evaluation. Cross-efficiency evaluation is an effective way of ranking decisionmaking units (DMUs) in data envelopment analysis (DEA). Conventional DEA models assume nonnegativevalues for inputs and outputs. However, we know that unlike return and skewness, varianceis the on...
متن کاملA general, prediction error-based criterion for selecting model complexity for high-dimensional survival models.
When fitting predictive survival models to high-dimensional data, an adequate criterion for selecting model complexity is needed to avoid overfitting. The complexity parameter is typically selected by the predictive partial log-likelihood (PLL) estimated via cross-validation. As an alternative criterion, we propose a relative version of the integrated prediction error curve (IPEC), which can be...
متن کاملIdentifying an Optimal Cutpoint for the Diagnosis of Hypertriglyceridemia in the Nonfasting State.
BACKGROUND Nonfasting triglycerides are similar or superior to fasting triglycerides at predicting cardiovascular events. However, diagnostic cutpoints are based on fasting triglycerides. We examined the optimal cutpoint for increased nonfasting triglycerides. METHODS We obtained baseline nonfasting (<8 h since last meal) samples from 6391 participants in the Women's Health Study who were fol...
متن کاملAN APPROACH TOWARDS WAVE CLIMATE STUDY IN THE PERSIAN GULF AND THE GULF OF OMAN: SIMULATION AND VALIDATION
This article describes the 11-year wave simulation (1992-2002) in the Persian Gulf and the Gulf of Oman using the input data derived from European Center for Medium-Range Weather Forecasts (ECMWF). The ECMWF 10 meter wind field and spectral wave boundary condition at 18 ْN degree are input into one of the latest versions of numerical wave models (3rd generation) after a few local modifications. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Statistics in medicine
دوره 15 20 شماره
صفحات -
تاریخ انتشار 1996